Marvin Martin & Aflak Michel Omar (ING5 BDA Gr01A)
26/11/2020
Airbnb daily data is very valuable. An investor is eager to use this data to make key decisions about the best real estate option available to generate benefit. In this project, we will use scrapped data from 6 countries aggregated over a period of time (For each city of these countries we kept the 3 latest dates collected):
These datasets can be download on this website http://insideairbnb.com/get-the-data.html. A csv documents is available in data\all_data_urls.csv where all the scrapped urls are available and ready to download .
Because these dataset are huge, we made some processing to focus on important information and by the way use a reasonable amount of data (fit computation and time limitations). We been throw several steps:
################### Code From utils/tools.R ##################################
urls <- read.csv(file.path("./data/all_data_urls.csv")) # Step 1
df <- extract_all_meta(urls) # Step 2
lastest_dates <- 3 # Step 3
countries <- c("france", "spain", "the-netherlands", "germany", "belgium","italy") # Step 4
download_data(df, countries, lastest_dates) # Step 5We reduce the data size from several Gb to only hundred of Mo. We are now ready to play with it!
Starting with raw data, we been throw several steps:
[Step 1] Load csv data with urls and meta provided (read.csv)
[Step 2] Extract “country”, “region”, “city”, “date” and “url” from the csv in a dataframe (extract_all_meta)
[Step 3] Specify the number “n” of latest scrapping date you are looking for.
[Step 4] Select a list of 6 countries, on which you want to work on.
[Step 5] Go through this dataframe, line by line and do the following steps (download_data and prepare_data) :
/data/countries/listings_CITY_NAME_date.csv.[Step 5 - Remarque] This big step results in a csv file for every cities of the countries listed.
We could have avoid writing files, but as long as this step takes more thanks 10 minutes, we preferred to keep then in our memory.
[Step 6] Get Final preprocess dataset by merging all the cities csv into a single data frame (load_global_listings).
This step is used at the beginning of the server and takes around 20 seconds.
Here is the shape of our dataset:
# Publications :1321825
# Features :21
Features names are:
## - id
## - country
## - region
## - city
## - date
## - neighbourhood_cleansed
## - latitude
## - longitude
## - property_type
## - room_type
## - accommodates
## - bedrooms
## - beds
## - price
## - minimum_nights
## - maximum_nights
## - review_scores_rating
## - availability_30
## - price_30
## - revenue_30
## - latitudelongitude
Tab 1 - Analysis by comparing several cities
Tab 2 - Analysis only one city
We use several libraries (webapp, graphical, data manipulations) to build this project:
shiny, googleVis, ggplot2, dplyr, data.table, stringr and glue
################### Code From shinyApp/ui.R ##################################
# IT IS SPEUDO CODE !!!
fluidPage
tabsetPanel
tabPanel # Analysis 1 Tab
sidebarLayout
sidebarPanel # Tool Bar
Checkbox, selectInput, uiOutput, ...
mainPanel # Plots
htmlOutput, plotOutput ...
tabPanel # Analysis 2 Tab
sidebarLayout
sidebarPanel # Tool Bar
Checkbox, selectInput, uiOutput, ...
mainPanel # Plots
htmlOutput, plotOutput ...################### Code From shinyApp/server.R ##################################
# IT IS SPEUDO CODE !!!
listings <- load_global_listings() # Download data
# Server
server
# Tab 1 variables
reactive # Reactive DataFrame (filter by country / cities / features)
renderUI # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
renderGvis, renderPlot # Plots send to ui send from server to htmlOutput,plotOutput (histogram,...)
# Tab 2 variables
reactive # Reactive DataFrame (filter one city)
renderUI # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
renderGvis, renderPlot # Plots send to ui from server to htmlOutput,plotOutput (map,...)Each tab is splited into to vertical part: Tool Bar and Plots
You can:
You can: